weak linear function approximation
Efficient Planning in Large MDPs with Weak Linear Function Approximation
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.
Review for NeurIPS paper: Efficient Planning in Large MDPs with Weak Linear Function Approximation
Additional Feedback: After rebuttal: I read the author response and other reviews. It would be great to see more discussions in the next version. I think this paper has enough contribution and I will keep my original rating for acceptance. This paper is well written, and easy to follow. The main text is clean and the proof is deferred to the appendix.
Review for NeurIPS paper: Efficient Planning in Large MDPs with Weak Linear Function Approximation
All reviewers agree that the paper makes a nice contribution to planning with function approximation. In particular, the paper considers an important open problem, and while the problem is solved by making a few assumptions (mostly notably the core states), the results have made significant progress on the important problem. The reviewers also appreciate the use of precise language and careful description of related work. Among the remaining concerns, R2 wants to see some evidence of robustness against the failure of the "core state" assumption. While performing empirical experiments may not fit the theoretical nature of the paper, the authors can consider a theoretical justification: namely, define a notion of error that measures how much the core-states assumption is violated, and show how such an error manifest itself in the final guarantee.
Efficient Planning in Large MDPs with Weak Linear Function Approximation
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.